Goto

Collaborating Authors

 step-by-step instruction


The AI Safety Demo That Caused Alarm in Washington

TIME - Tech

Welcome back to, TIME's new twice-weekly newsletter about AI. If you're reading this in your browser, why not subscribe to have the next one delivered straight to your inbox? Late last year, an AI researcher opened his laptop and showed me something jaw-dropping. Lucas Hansen, co-founder of nonprofit CivAI, was showing me an app he built that coaxed popular AI models into giving what appeared to be detailed step-by-step instructions for creating poliovirus and anthrax. Any safeguards that these models had were stripped away.


SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

arXiv.org Artificial Intelligence

The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SoP, a simple yet effective framework to design jailbreak prompts automatically. Inspired by the social facilitation concept, SoP generates and optimizes multiple jailbreak characters to bypass the guardrails of the target LLM. Different from previous work which relies on proprietary LLMs or seed jailbreak templates crafted by human expertise, SoP can generate and optimize the jailbreak prompt in a cold-start scenario using open-sourced LLMs without any seed jailbreak templates. Experimental results show that SoP achieves attack success rates of 88% and 60% in bypassing the safety alignment of GPT-3.5-1106 and GPT-4, respectively. Furthermore, we extensively evaluate the transferability of the generated templates across different LLMs and held-out malicious requests, while also exploring defense strategies against the jailbreak attack designed by SoP. Code is available at https://github.com/Yang-Yan-Yang-Yan/SoP.


RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents

arXiv.org Artificial Intelligence

In this past year, large language models (LLMs) have had remarkable success in domains outside the traditional natural language processing, and people are starting to explore the usage of LLMs in more general and close to application domains like code generation, travel planning, and robot controls. Connecting these LLMs with great capacity and external tools, people are building the so-called LLM agents, which are supposed to help people do all kinds of work in everyday life. In all these domains, the prompt to the LLMs has been shown to make a big difference in what the LLM would generate and thus affect the performance of the LLM agents. Therefore, automatic prompt engineering has become an important question for many researchers and users of LLMs. In this paper, we propose a novel method, \textsc{RePrompt}, which does "gradient descent" to optimize the step-by-step instructions in the prompt of the LLM agents based on the chat history obtained from interactions with LLM agents. By optimizing the prompt, the LLM will learn how to plan in specific domains. We have used experiments in PDDL generation and travel planning to show that our method could generally improve the performance for different reasoning tasks when using the updated prompt as the initial prompt.


Bypassing the Safety Training of Open-Source LLMs with Priming Attacks

arXiv.org Artificial Intelligence

Content warning: This paper contains examples of harmful language. With the recent surge in popularity of LLMs has come an ever-increasing need for LLM safety training. In this paper, we investigate the fragility of SOTA opensource LLMs under simple, optimization-free attacks we refer to as priming attacks, which are easy to execute and effectively bypass alignment from safety training. Our proposed attack improves the Attack Success Rate on Harmful Behaviors, as measured by Llama Guard, by up to 3.3 compared to baselines. Autoregressive Large Language Models (LLMs) have emerged as powerful conversational agents widely used in user-facing applications. To ensure that LLMs cannot be used for nefarious purposes, they are extensively safety-trained for human alignment using techniques such as RLHF (Christiano et al., 2023). Despite such efforts, it is still possible to circumvent the alignment to obtain harmful outputs (Carlini et al., 2023). For instance, Zou et al. (2023) generated prompts to attack popular open-source aligned LLMs such as Llama-2 (Touvron et al., 2023a) and Vicuna (Chiang et al., 2023) to either output harmful target strings or comply with harmful behavior requests.


A New Attack Impacts ChatGPT--and No One Knows How to Stop It

WIRED

ChatGPT and its artificially intelligent siblings have been tweaked over and over to prevent troublemakers from getting them to spit out undesirable messages such as hate speech, personal information, or step-by-step instructions for building an improvised bomb. But researchers at Carnegie Mellon University last week showed that adding a simple incantation to a prompt--a string text that might look like gobbledygook to you or me but which carries subtle significance to an AI model trained on huge quantities of web data--can defy all of these defenses in several popular chatbots at once. The work suggests that the propensity for the cleverest AI chatbots to go off the rails isn't just a quirk that can be papered over with a few simple rules. Instead, it represents a more fundamental weakness that will complicate efforts to deploy the most advanced AI. "There's no way that we know of to patch this," says Zico Kolter, an associate professor at CMU involved in the study that uncovered the vulnerability, which affects several advanced AI chatbots. "We just don't know how to make them secure," Kolter adds.


Improving Cross-Task Generalization with Step-by-Step Instructions

arXiv.org Artificial Intelligence

Instruction tuning has been shown to be able to improve cross-task generalization of language models. However, it is still challenging for language models to complete the target tasks following the instructions, as the instructions are general and lack intermediate steps. To address this problem, we propose to incorporate the step-by-step instructions to help language models to decompose the tasks, which can provide the detailed and specific procedures for completing the target tasks. The step-by-step instructions are obtained automatically by prompting ChatGPT, which are further combined with the original instructions to tune language models. The extensive experiments on SUP-NATINST show that the high-quality step-by-step instructions can improve cross-task generalization across different model sizes. Moreover, the further analysis indicates the importance of the order of steps of the step-by-step instruction for the improvement. To facilitate future research, we release the step-by-step instructions and their human quality evaluation results.


10 awesomely practical tasks you can do with ChatGPT

PCWorld

ChatGPT, a powerful language model chatbot developed by OpenAI, has revolutionized the way we interact with artificial intelligence. With its advanced natural language processing capabilities, ChatGPT can help you with a wide range of tasks, from answering trivia questions to composing poetry. Here are 10 wonderfully fun, awesomely practical ways to put ChatGPT to work. One interesting thing ChatGPT can do is generate recipes based on user preferences, ingredients, or specific dietary requirements. Give it a starting point by providing certain information about the desired dish, such as the type of cuisine, the main ingredients you want to use, or any dietary restrictions you may have.


MOCA: A Modular Object-Centric Approach for Interactive Instruction Following

arXiv.org Artificial Intelligence

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for an AI agent. Recently, an `interactive instruction following' task has been proposed to foster research in reasoning over long instruction sequences that requires object interactions in a simulated environment. It involves solving open problems in vision, language and navigation literature at each step. To address this multifaceted problem, we propose a modular architecture that decouples the task into visual perception and action policy, and name it as MOCA, a Modular Object-Centric Approach. We evaluate our method on the ALFRED benchmark and empirically validate that it outperforms prior arts by significant margins in all metrics with good generalization performance (high success rate in unseen environments). Our code is available at https://github.com/gistvision/moca.


Beginning Java Programming - Programmer Books

#artificialintelligence

A comprehensive Java guide, with samples, exercises, case studies, and step-by-step instruction Beginning Java Programming: The Object Oriented Approach is a straightforward resource for getting started with one of the world's most enduringly popular programming languages. Based on classes taught by the authors, the book starts with the basics and gradually builds into more advanced concepts. The approach utilizes an integrated development environment that allows readers to immediately apply what they learn, and includes step-by-step instruction with plenty of sample programs. Each chapter contains exercises based on real-world business and educational scenarios, and the final chapter uses case studies to combine several concepts and put readers' new skills to the test. Beginning Java Programming: The Object Oriented Approach provides both the information and the tools beginners need to develop Java skills, from the general concepts of object-oriented programming.


Build Deeper: What's in the Book

#artificialintelligence

So, let's see what I've covered in the book. Build Deeper: The Path to Deep Learning The new book is the successor to my earlier book - Build Deeper: Deep Learning Beginners' Guide - (which is why I called this the'second edition), to which I've added a lot more topics this time. The new book is more than twice the length of the old book, and covers more breadth and depth in Deep Learning. Here's what you can expect in the book: A detailed explanation on what Deep Learning is, what it isn't, and how it relates to other areas in AI. What Deep Learning has achieved through the years, including recent achievements such as OpenAI, and DeepMind.